A Comparison of Techniques for Classification and Ad Hoc Retrieval of Biomedical Documents
نویسندگان
چکیده
participated in both the classification and ad hoc retrieval tasks of the TREC 2005 Genomics Track. To better understand the text classification techniques that lead to improved performance, we applied a set of general purpose biomedical document classification systems to the four triage tasks, varying one system feature or text processing technique at a time. We found that our best and most consistent system consisted of a voting perceptron classifier, chi-square feature selection on full text articles, binary feature weighting, stemming and stopping, and pre-filtering based on the MeSH term Mice. This system approached, but did not surpass, the performance of the best TREC entry for each of the four tasks. Full text provided a substantial benefit over only title plus abstract. Other common techniques such as inverse-document frequency feature weighting, and cosine normalization were ineffective. For the ad hoc retrieval task, we used Zettair search engine. Both of our submissions used Okapi measure with the parameters optimized using the sample topics that were provided. Two different query sets were used in our runs; one with all the words and the other with only the keywords from the topic file. Queries with only keywords consistently outperformed queries with all words from the topic file. Optimization of the Okapi parameters improved our performance.
منابع مشابه
DEMIR at ImageCLEFMed 2013: The Effects of Modality Classification to Information Retrieval
This paper present the details of participation of DEMIR (Dokuz Eylül University Multimedia Information Retrieval) research team to the ImageCLEF 2013 Medical Retrieval task. This year, we participated to two subtasks: modality classification and ad-hoc image-based retrieval. For them, our central method is integrated combination multimodal retrieval applied to retrieved documents sets of each ...
متن کاملMachine Learning for Information Retrieval
In this thesis, we explore the use of machine learning techniques for information retrieval. More specifically, we focus on ad-hoc retrieval, which is concerned with searching large corpora to identify the documents relevant to user queries. This identification is performed through a ranking task. Given a user query, an ad-hoc retrieval system ranks the corpus documents, so that the documents r...
متن کاملEnhancing access to the Bibliome: the TREC 2004 Genomics Track
BACKGROUND The goal of the TREC Genomics Track is to improve information retrieval in the area of genomics by creating test collections that will allow researchers to improve and better understand failures of their systems. The 2004 track included an ad hoc retrieval task, simulating use of a search engine to obtain documents about biomedical topics. This paper describes the Genomics Track of t...
متن کاملComparison between Ad-hoc Retrieval and Filtering Retrieval Using Arabic Documents
We have selected 242 Arabic abstracts used by (Hmeidi and Kanaan, 1997); all of which involve computer science and information system. We have also designed and built a new system to compare two different retrieval tasks: Ad-hoc retrieval and filtering retrieval. However, we have defined Ad-hoc and filtering retrieval systems and illustrated the development strategy for each system. We have com...
متن کاملA Re-Ranking Method Based on Irrelevant Documents in Ad-Hoc Retrieval
In this paper, we propose a novel approach for document re-ranking, which relies on the concept of negative feedback represented by irrelevant documents. In a previous paper, a pseudo-relevance feedback method is introduced using an absorbing document d̃ which best fits the user’s need. The document d̃ is orthogonal to the majority of irrelevant documents. In this paper, this document is used to ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005